Wearable-based accelerometer activity profile as digital biomarker of inflammation, biological age, and mortality using hierarchical clustering analysis in NHANES 2011–2014

Repeated disruptions in circadian rhythms are associated with implications for health outcomes and longevity. The utilization of wearable devices in quantifying circadian rhythm to elucidate its connection to longevity, through continuously collected data remains largely unstudied. In this work, we investigate a data-driven segmentation of the 24-h accelerometer activity profiles from wearables as a novel digital biomarker for longevity in 7,297 U.S. adults from the 2011–2014 National Health and Nutrition Examination Survey. Using hierarchical clustering, we identified five clusters and described them as follows: “High activity”, “Low activity”, “Mild circadian rhythm (CR) disruption”, “Severe CR disruption”, and “Very low activity”. Young adults with extreme CR disturbance are seemingly healthy with few comorbid conditions, but in fact associated with higher white blood cell, neutrophils, and lymphocyte counts (0.05–0.07 log-unit, all p < 0.05) and accelerated biological aging (1.42 years, p < 0.001). Older adults with CR disruption are significantly associated with increased systemic inflammation indexes (0.09–0.12 log-unit, all p < 0.05), biological aging advance (1.28 years, p = 0.021), and all-cause mortality risk (HR = 1.58, p = 0.042). Our findings highlight the importance of circadian alignment on longevity across all ages and suggest that data from wearable accelerometers can help in identifying at-risk populations and personalize treatments for healthier aging.

medicine, which emphasizes prediction, prevention, personalization, and participation over a one-size-fits-all approach 14 .
The circadian rhythm, an endogenous 24-h cycle regulated by the master clock in the suprachiasmatic nucleus of the brain, has also been recognized as a crucial factor for maintaining optimal health and healthspan 15 . The circadian rhythm regulates various physiological, biological, and behavioral processes in the body, including sleep-wake cycles, hormone production, metabolism, and immune function 16 . Although external time cues such as "zeitgeber" (24-h light-dark cycle) can influence the circadian rhythm, it is predominantly controlled by endogenous factors, which are deeply rooted in an individual's genetic makeup. Emerging evidence strongly suggests that the disturbance or misalignment of the circadian rhythms has profound implications for health outcomes, including disrupted metabolism and hormone regulation as well as an increased risk of various chronic diseases such as metabolic syndrome, diabetes, cardiovascular disease, and cancer 17 . In addition, it has been linked to immune deficiency, chronic inflammation, obesity, fatigue, and a higher likelihood of experiencing sleep disorder [18][19][20][21] . As a result, maintaining a healthy circadian rhythm is crucial for overall health and well-being, reducing the risk of adverse health effects and improving quality of life 22,23 . Considering the association between circadian rhythms and their impact on lifespan, along with the widespread adoption of recent technological advancements, we argue that smartwatches present a promising opportunity for leveraging digital biomarkers for longevity 24 . Smartwatches provide a practical means for continuously monitoring accelerometer data 25 and heart rate data 26 , offering valuable insights into circadian rhythms.
Utilization of consumer smartwatches for data collection and analysis of potential digital biomarkers is, however, limited by a number of factors such as proprietary algorithms, limited data ownership, short lifespan, and variable wear time. Hence, ActiGraph devices, which are designed for research purposes, allow us to fully investigate the potential of future applications that could be implemented on these digital devices by using developed algorithms.
Furthermore, the application of machine learning (ML) to continuously collected data from wearables elucidates hidden patterns as digital phenotypes and facilitates subpopulation identification 27 . Conventional, expertdriven classification of disease or at-risk populations is limited by a lack of agreed ways of knowing the number of natural clusters in the populations of interest and determining the variables on which to base segmentation 28,29 . Instead, the use of a holistic and data-driven clustering approach has gained recognition as an alternative 29 . That is, each individual exists within multiple classes of health levels and provides various modalities of digitally measured physiological and behavioral data, which then correspond to multiple clusters of health status. Similar to those pioneered in the genomics fields, this method can result in advances in our understanding of the complex, multitude of components of disease etiology. To summarize, digital biomarkers and data-driven clustering approaches enable the use of precision medicine. These methods can classify a population into groups with unique characteristics or health risks and help individuals move from "unhealthy or at-risk" classes to "healthy" classes through intervention.
To date, the potential of using continuously collected data from wearables to explain longevity remains largely unstudied. In this study, we seek to investigate the use of 24-h activity profiles, such as accelerometer data, as a novel digital biomarker for longevity and tailored treatment. Our approach differs from previous research as it applies a data-driven approach to evaluate the association between 24-h accelerometer data and longevity measures in a nationally representative sample. This brings three distinct advantages in comparison to existing research. First, we apply population segmentation of wearable-based activity to the general U.S. adult population in the National Health and Nutrition Examination Survey (NHANES) cohort to increase the generalizability of our findings as compared to previously studying specific populations (i.e., chronic insomnia disorders 30 , middleaged women 31 ). Second, our ML-based clustering approach includes features depicting a detailed resolution of the 24-h activity profile, which represents comprehensive captures of both daily activity and physiological manifestations of the biological clock (e.g., 'circadian rhythm') such as the sleep/wake cycle 32 . In addition, the 24-h activity profile provides detailed information on an individual's daily activity span, including timing and intensity, making it a richer source of information for health monitoring. Last, we examine the relationship between data-driven segmentation and different longevity outcomes that represent various dimensions of the current (i.e., inflammation and mortality) and predicted (i.e., biological age) health states of participants 10,33 .
Data-driven population segmentation and cluster profiling. Applying hierarchical clustering to the wearable-derived hourly average activity data, we identified 22% (n = 1,628) participants in Cluster 1, 37% (n = 2,670) in Cluster 2, 17% (n = 1,256) in Cluster 3, 8% (n = 558) in Cluster 4, and 16% (n = 1,185) in Cluster 5. We observed distinct 24-h activity attributes by cluster (see Fig. 1). The rest/sleep hours for participants were defined based on the time period between 23:00 and 07:00, which is consistent with previous research on circadian rhythm and sleep, as well as their associations with various health outcomes 34 . Specifically, Cluster 1 showed a substantially higher activity level than the population average between 11:00 and 22:00 (Z-score: 0.75-0.98). During the rest/sleep period (i.e., 23:00-07:00), activity intensity is reduced dramatically and reached the nadir at 04:00. Cluster 2 participants showed above-average activity in the early morning between 05:00 and 10:00 (Z-score: 0.25-0.41), followed by the activity levels around the population mean during the daytime. We also observed a relatively earlier decline in accelerometer activity starting from 18:00. Cluster 3 exhibits low activity www.nature.com/scientificreports/ during active hours between 07:00-21:00 (Z-score: − 0.54 − (− 0.06)) and increased activity above the population average between late night and early morning (i.e., 23:00-04:00, Z-score: 0.05-0.43). Cluster 4, the smallest cluster in size, is unique with its elevation of activity starting from 14:00 and high-level activity throughout the rest/ sleep period, reaching its peak at 01:00 (Z-score: 2.44). Participants in Cluster 4 then had a substantial decline and dampened activity between 06:00 and 14:00, reaching the nadir at 08:00 (Z-score: − 0.67). Lastly, Cluster 5 has an all-time very low activity as shown in negative Z-scores. Particularly, daytime activity between 12:00 and 21:00 is significantly reduced (Z-score < − 1.0) in this cluster compared to the population mean. We used the Student's t-test for continuous variables and the Chi-square test for categorical variables to assess the statistical significance of demographic and socioeconomic characteristics, BMI groups, movement behaviors, sleep quality, and medical history by clusters. In data-driven clusters, all variables except asthma were statistically significant (see Table 2). Clusters 1 and 4 were on average young adults (median ages 41 and 36). Clusters 2 and 3 included middle-aged adults (median ages 53 and 51). Cluster 5 consisted of an older population aged between 60 and 80 years. Comparing two middle-aged clusters, Cluster 3 had significantly higher percentages of NH Black (18% vs. 8%) and obesity (44% vs. 36%), higher unemployment rates (52% vs. 33%), fewer participants working ≥ 40 h/week (32% vs. 47%), and lower household income than Cluster 2. In addition, Cluster 3 participants reported spending more time in sedentary behaviors and less time in moderate-or vigorous-intensity activities, with a lower proportion (55%) meeting recommended moderate-and vigorous-intensity physical activity (MVPA) guidelines, in contrast to Cluster 2 which had a higher percentage (65%) meeting the guidelines. Cluster 3 had a greater proportion of participants reporting sleep disturbances and clinically diagnosed sleep disorders, as well as a higher prevalence of cardiovascular disease, cancer, stroke, diabetes, hypertension, and arthritis. When comparing the two young adult clusters, Cluster 4 had a larger percentage of males (55% vs. 36%), being non-Hispanic Black (20% vs. 11%) and unmarried (55% vs. 36%), and obesity (40% vs. 31%), having lower income levels and family income to poverty ratio than Cluster 1. Unlike middle-aged clusters, we observed no significant differences in the distributions of working ≥ 40 h/week (~ 40%), working < 40 h/week (~ 20%), and unemployed (~ 30%) between these two groups. In addition, there were no significant differences in the prevalence of medical conditions. In the comparison of movement behaviors, participants in Cluster 4 demonstrated a bimodal relationship, with longer periods of both sedentary and moderate-and vigorous-intensity activity durations compared to those in Cluster 1. Furthermore, our analysis revealed five distinctive characteristics of Cluster 4, which included the highest percentages of NH Black and current smokers, the lowest family income to poverty ratio, the shortest sleep duration, and the longest MVPA duration. Finally, Cluster 5, the eldest population, had the highest number of medical conditions and reported the longest sleep and sedentary time.
Differences in inflammatory biomarkers, biological age, and mortality according to cluster classification. We assessed the associations between data-driven clusters and white blood cell-based inflammatory biomarker levels (see Fig. 2), Klemera-Doubal (KDM) biological age (see Fig. 3), and all-cause mortality      Fig. 4). Across health-related outcomes, we observed Cluster 1 to perform best and Cluster 5 to perform worst. These associations hold even after adjusting for covariates. Specifically, Clusters 3, 4, and 5 had 0.05-0.10 log-unit higher white blood cell counts and 0.08-0.15 logunit higher neutrophil counts compared to Cluster 1 (see Fig. 2). In addition, Cluster 4 was associated with 0.05 log-unit higher (95% CI: 0.010-0.085) lymphocyte count. Clusters 3 and 5 were associated with 0.06-0.12 and 0.09-0.14 log-unit increases in NLR and hematological aggregate indices for systemic inflammation expressed in SII and AISI (all p < 0.05).
For KDM biological age, we noticed an accelerated advance of the biological aging process in Clusters 3 to 4 to 5 (See Fig. 3). Specifically, participants in Cluster 3 had a biological age advance of 0.25 log-years (equivalent to 1.28 years, 95% CI: 0.043-0.467) greater than those in Cluster 1. Participants in Clusters 4 and 5 exhibited an even faster rate of biological age advance, at 0.35 log-years (equivalent to 1.42 years, 95% CI: 0.175-0.522) and 0.53 log-years (equivalent to 1.70 years, 95% CI: 0.298-0.760), respectively.

Discussion
We applied a data-driven clustering approach to identify population segments based on 24-h accelerometer activity data collected using a wearable device in U.S. adults. Based on the 24-h activity profiles, we found five distinct clusters, which we describe as follows. Cluster 1 represents a "High activity" group maintaining elevated levels of activity throughout the day. Cluster 2 portrays a "Low activity" group, exhibiting a diurnal pattern similar to that of Cluster 1 but with lower overall activity levels throughout the day and a faster decline from early evening. Clusters 3 and 4 represent the "Mild circadian rhythm (CR) disruption" group and the "Severe CR disruption" group, respectively. Cluster 3 participants exhibit increased nocturnal activity between 23:00 and 04:00, while their daytime activity remains low. Cluster 4 is characterized by extremely low activity from morning to early afternoon, a gradual elevation in the evening, notably high activity during rest/sleep hours, and a sharp fall in the morning. These activity patterns are indicative of circadian misalignment or disrupted rhythm, as they do not align well with normal light-darkness schedules. Therefore, we have classified these clusters as having circadian rhythm disruption. Lastly, Cluster 5 represents "Very low activity" group.
We demonstrated that clusters are significantly associated with baseline characteristics, as determined by t-test and Chi-square tests. The identified clusters are clearly differentiated by demographic and socioeconomic factors, movement behaviors, and medical conditions. Furthermore, our generalized linear models and Cox proportional hazards models revealed significant associations and gradient effects between cluster membership and three longevity outcomes, namely inflammatory biomarker levels, biological age, and all-cause mortality. Across all health-related outcomes, "High activity" group (Cluster 1) tends to have the best performance, with the lowest values of inflammation levels, biological age advance, and mortality. This was followed by "Low activity" (Cluster 2), "Mild CR disruption" (Cluster 3), and "Severe CR disruption" (Cluster 4). "Very low activity" (Cluster 5) group performed worst, with the highest inflammation levels, mortality risk, and biological age (see Fig. 5).
There were, however, a few exceptions. "Severe CR disruption", consisting of young adults aged 30-40 years, was significantly associated with increased inflammatory biomarkers and accelerated biological age, but not with all-cause mortality and medical histories. This finding suggests that young adults with circadian misalignment may seem ostensibly healthy because they have no apparent signs of medical conditions and show high levels of activity, but in fact, are undergoing health deterioration and unhealthy aging. In middle-aged adults, having some degree of circadian cycle disturbance together with a low activity level ("Mild CR disruption") resulted in substantially higher inflammatory biomarker levels, mortality risk, and biological age compared to having low activity alone. This highlights the growing importance of circadian alignment in older populations to achieve healthy longevity.  www.nature.com/scientificreports/ Unlike physical activity or nutrition, there is still a lack of understanding regarding how to utilize or correct biological timing for health benefits. Current public health interventions are largely focusing on increasing physical activity levels or eating healthy, with less attention on targeting the circadian clock. Mounting evidence indicates that circadian disruption has significant consequences for various health outcomes, including performance, well-being, physical and mental health, and longevity 24,35 . As such, smartwatches and wearables offer a timely, unobstructed, and convenient method for monitoring and assessing circadian rhythms. With the increasing uptake of digital devices, circadian clock-based therapeutics have enormous potential for maximizing health benefits and promoting healthy aging at individual and population levels [36][37][38][39] . Coupled with machine learning algorithms, digitization of such passive behavior data has an unrecognized potential as novel digital biomarkers for longevity and advancing personalized interventions, automated health event prediction, and populationlevel prevention. As an implication of this study, we can utilize wearable data as a digital biomarker and deliver personalized intervention via digital devices to successfully promote synchronization with the diurnal cycle, i.e., migrate "unhealthy or at-risk" individuals to "healthy" clusters. Young adults with an impaired circadian cycle, for example, may be given recommendations such as timely light exposure, exercise at specific times, melatonin ingestion, or utilizing digital technology for monitoring to improve their sleep-wake cycle 40,41 . Meanwhile, older adults with low activity levels may be recommended to increase their physical activity and engage in other healthy behaviors to reduce the risk of age-related diseases and increase strength and mobility.
There are potential limitations of this study. First, the validity of the feature selection must be verified on new data, unseen from the model during the development phase. Second, this is a retrospective analysis and cannot establish causal relationships between the observed associations. Third, we use only 7-day accelerometer data, and a longer duration of monitoring would provide a more precise and accurate classification of clusters. Fourth, unmeasured environmental factors or residual confounding could have affected accelerometry measurements. Similarly, non-wear time and missing accelerometry measures may influence the activity output. However, the impact is minimal as we selected participants with complete 5-min epoch information in the analysis. Next, data on shift work status and work schedule are missing, and it is possible that the clusters we identified may be biased toward including shift workers and therefore not representative of the general population with normal work schedules. However, we believe that the impact of shift work status would not fully explain our findings for two reasons. First, we found that the employment status in our data did not significantly differ comparing the group with the severe circadian disruption to the group without disruption. Second, controlling for employment status did not affect the original associations in the generalized linear models and Cox proportional hazards models. Lastly, the initial cost of purchasing a wearable device may not appear to be cost-effective from a population perspective in the short term ($250.00 per unit). However, it could potentially become cost-effective in the long term due to the following reasons: (1) the widespread use of smartphones and smartwatches makes them scalable solutions for continuous data collection in a large population; (2) wearables are more economical in the long run when compared to traditional methods such as clinical visits or lab tests, which require physical encounters and can incur costs for each visit; (3) as technology advances, the availability of low-cost wearable devices and commercial smartwatches with accelerometer functionality is increasing.
Nevertheless, this study offers the following contributions over previous research. This study used wearablebased accelerometer activity data to segment a nationally representative sample of the U.S. population. A novel, www.nature.com/scientificreports/ detailed resolution of the 24-h activity profile elucidates distinct cluster profiles and highlights circadian misalignment and rhythm disruption to play a critical role in longevity measures of inflammation, biological age, and mortality. With this work, we add a meaningful contribution to current research in the field demonstrating the potential for the digitization of human longevity measures based on continuous wearable-based activity data. A digital biomarker for longevity has enormous potential for digital phenotyping, personalized intervention, population-level prevention, and remote monitoring of people's health. It is also a critical step toward achieving the aim of precision medicine. Future studies with prospective and repeated assessments using digital devices are warranted.

Methods
Participants. We utilized data from the NHANES, a nationwide cross-sectional survey conducted by the  Fig. 6).
Serum inflammatory biomarker measures. Blood sample collection, laboratory methods, and detailed processing instructions are described in the NHANES Laboratory/Medical Technologists Procedure Manual 43 . The blood analyzer provided white blood cell count, neutrophil count, lymphocyte count, and neutrophil-tolymphocyte ratio (NLR). We additionally computed two hematological indexes for systemic inflammation, the systemic immune-inflammation index (SII) and the aggregate index of systemic inflammation (AISI), using the following formulae 44,45 : • SII = neutrophil x platelet/lymphocyte count • AISI = neutrophil x monocyte x platelet/lymphocyte count.
Biological aging measure. We used the modified Klemera-Doubal method (KDM) for biological age prediction 10,46 . We chose KDM biological age as it has shown to be more accurate than other alternatives for the prediction of morbidity, mortality, and indicators of health span 47,48 . We included 11 biomarkers in the biological age estimation using BioAge R package 0.1.0. 49 : albumin, alkaline phosphatase, total cholesterol, creatinine, HbA1c, systolic blood pressure, blood urea nitrogen, uric acid, lymphocyte percentage, mean cell volume, and white blood cell count. Figure 6. Flowchart for inclusion of study participants. Data processing of wearable-derived accelerometer activity data. All participants aged 6 years and older during the 2011-2012 cycle and all participants aged 3 years and older during the 2013-2014 cycle wore an ActiGraph GT3X + (Actigraph, Pensacola, FL) accelerometer on the non-dominant wrist for 7 consecutive 24-h periods. The wearable collected raw signals on the x, y, and z axes with a sampling rate of 80 Hz. NHANES processed, flagged, and summarized accelerometer data at the minute level in Monitor-Independent Movement Summary (MIMS) units, which is a non-proprietary, open-source, device-independent universal summary metric . We applied a series of quality control and data processing steps to identify valid accelerometer data suitable for our analysis. First, we included accelerometer data from participants who wear the accelerometer for 16 h or more per day for at least 4 days, not including the first day of wear, which was excluded from data processing. Previous research indicates that for population-level analyses 16 h of wear time for 4 or more days were sufficient to generate stable group-level estimates of activity using accelerometer data 51 54,55 . We then applied recursive feature elimination to identify an optimal set of features from the aforementioned 24 input entities of activity levels that significantly separate clusters in our data (see Supplementary Fig. 1). Using only the 16 selected features, we applied a hierarchical clustering approach using Ward's linkage algorithm with Euclidean distances for population segmentation of wearable-based accelerometer activity data in U.S. adults (see Supplementary Fig. 1). All analysis was performed using R software 4. Covariates. We obtained additional information on characteristics a priori that would be associated with inflammatory biomarkers, biological age and mortality based on previous research 36,56,57 : Age, sex, race/ethnicity, family income to poverty ratio, education, marital status, employment status, household income, smoking status, sleep hours and quality, and history of cardiovascular disease, cancer, stroke, diabetes, hypertension, asthma, and arthritis. We calculated the body mass index (BMI) by dividing weight in kilograms by height in meters squared. BMI was further categorized into three groups: Normal weight (BMI < 25), Overweight (BMI 25-29.9), and Obese (BMI ≥ 30). Durations of different movement behaviors such as sleep, sedentary, moderateintensity, and vigorous-intensity physical activity durations were assessed by self-report. We categorized participants to have sufficient moderate-and vigorous-intensity physical activity (MVPA) if he/she meets guidelines recommended by the Physical Activity Guidelines for Americans (i.e., 150 min or more moderate-intensity activity per week or 75 min or more vigorous-intensity activity per week) 58 .

Statistical analysis.
To account for the complex survey design and produce representative estimates of the U.S. population, we applied four-year survey weights to all statistical procedures using the survey package 4.1-1 to adjust for unequal selection probability and non-response bias in accordance with NHANES analytical guidelines 59 . In descriptive statistics, we obtained the population means, proportion, and standard errors (SE) with the entire sample (see Table 1) and by cluster (see Table 2). We conducted Student's t-test or Chi-square tests for continuous or categorical variables to compare baseline characteristics by cluster.
For associations of clusters with serum inflammatory biomarkers (i.e., white blood cell count, neutrophil count, lymphocyte count, NLR, SII, and AISI) and the Klemera-Doubal method-based biological age, we used the survey-weighted generalized linear models with and without adjusting covariates (see Figs. 2 and 3). Considering the skewed distribution, dependent variables were log-transformed in these models. In addition, we depicted the differences in all-cause mortality based on clusters in a weighted Kaplan-Meier curve with R package adjustedCurves 0.9.1 (see Fig. 4a). We further fitted a survey-weighted Cox proportional hazard model adjusting for covariates to estimate HRs and 95% CI for associations between clusters and all-cause mortality (see Fig. 4b). The proportional hazard assumption was satisfied. Based on a backward selection, we included age, www.nature.com/scientificreports/ sex, race/ethnicity, and employment status in adjusted models. We conducted sensitivity analyses to check the interactions between clusters and covariates, and no effect modification was observed. Statistical significance was at two-sided p < 0.05.

Data availability
The NHANES data that support the findings of this study are available from CDC Centers for Disease Control and Prevention website [https:// wwwn. cdc. gov/ nchs/ nhanes/ Defau lt. aspx].